Goto

Collaborating Authors

 incremental autonomous exploration


Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Neural Information Processing Systems

We study the problem of exploring an unknown environment when no reward function is provided to the agent. Building on the incremental exploration setting introduced by Lim and Auer (2012), we define the objective of learning the set of $\epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps (in expectation) from a reference state $s_0$. In this paper, we introduce a novel model-based approach that interleaves discovering new states from $s_0$ and improving the accuracy of a model estimate that is used to compute goal-conditioned policies.


Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Neural Information Processing Systems

We study the problem of exploring an unknown environment when no reward function is provided to the agent. Building on the incremental exploration setting introduced by Lim and Auer (2012), we define the objective of learning the set of \epsilon -optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s_0 . In this paper, we introduce a novel model-based approach that interleaves discovering new states from s_0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies. The resulting algorithm, DisCo, achieves a sample complexity scaling as \widetilde{O}_{\epsilon}(L 5 S_{L \epsilon} \Gamma_{L \epsilon} A \epsilon {-2}), where A is the number of actions, S_{L \epsilon} is the number of states that are incrementally reachable from s_0 in L \epsilon steps, and \Gamma_{L \epsilon} is the branching factor of the dynamics over such states. This improves over the algorithm proposed in (Lim and Auer, 2012) in both \epsilon and L at the cost of an extra \Gamma_{L \epsilon} factor, which is small in most environments of interest.


Layered State Discovery for Incremental Autonomous Exploration

Chen, Liyu, Tirinzoni, Andrea, Lazaric, Alessandro, Pirotta, Matteo

arXiv.org Artificial Intelligence

We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this setting, the objective is to discover a set of $\epsilon$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of incrementally $L$-controllable states. We introduce a novel layered decomposition of the set of incrementally $L$-controllable states that is based on the iterative application of a state-expansion operator. We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states. LAE improves over the algorithm of Tarbouriech et al. (2020a) by a factor of $L^2$ and it is the first algorithm for AX that works in a countably-infinite state space. Moreover, we show that, under a certain identifiability assumption, LAE achieves minimax-optimal sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L}A\ln^{12}(S^{\rightarrow}_{L})/\epsilon^2)$, outperforming existing algorithms and matching for the first time the lower bound proved by Cai et al. (2022) up to logarithmic factors.